3 research outputs found

    GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics

    Full text link
    Large scale graph processing is a major research area for Big Data exploration. Vertex centric programming models like Pregel are gaining traction due to their simple abstraction that allows for scalable execution on distributed systems naturally. However, there are limitations to this approach which cause vertex centric algorithms to under-perform due to poor compute to communication overhead ratio and slow convergence of iterative superstep. In this paper we introduce GoFFish a scalable sub-graph centric framework co-designed with a distributed persistent graph storage for large scale graph analytics on commodity clusters. We introduce a sub-graph centric programming abstraction that combines the scalability of a vertex centric approach with the flexibility of shared memory sub-graph computation. We map Connected Components, SSSP and PageRank algorithms to this model to illustrate its flexibility. Further, we empirically analyze GoFFish using several real world graphs and demonstrate its significant performance improvement, orders of magnitude in some cases, compared to Apache Giraph, the leading open source vertex centric implementation.Comment: Under review by a conference, 201

    Distributed Programming over Time-series Graphs

    No full text
    Graphs are a key form of Big Data, and performing scalable analytics over them is invaluable to many domains. There is an emerging class of inter-connected data which accumulates or varies over time, and on which novel algorithms both over the network structure and across the time-variant attribute values is necessary. We formalize the notion of time-series graphs and propose a Temporally Iterative BSP programming abstraction to develop algorithms on such datasets using several design patterns. Our abstractions leverage a sub-graph centric programming model and extend it to the temporal dimension. We present three time-series graph algorithms based on these design patterns and abstractions, and analyze their performance using the GoFFish distributed platform on Amazon AWS Cloud. Our results demonstrate the efficacy of the abstractions to develop practical time-series graph algorithms, and scale them on commodity hardware
    corecore